Structuring Content with XML
نویسنده
چکیده
XML as the most successful data representation format makes it easy to start working with structured data because of the simplicity of XML documents and DTDs, and because of the general availabilityof tools. This paper first describes the origin and features of XML as a markup language. In a second part, the question of how to use the features provided by XML for structuring content is addressed. Data modeling for electronic publishing and document engineering is an research field with many open issues, the most important open question being what to use as the modeling language for XML-based applications. While the paper does not provide a solution to the modeling language question, it provides guidelines for how to design schemas once the model has been defined.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملMultiX: an XML based formalism to encode multi- structured documents
This paper concerns the issue of document multi-structuring. For various use objectives, many distinct structures may be defined simultaneously from the same original document. For example, a document may have both a structure for logical content organisation (logical structure), and a structure expressing a set of content formatting rules (physical structure). We have already proposed a generi...
متن کاملTools for content-based retrieval and transformation of audio using MPEG-7: the SPOffline and the MDTools
In this paper we present a set of applications for content-based retrieval and transformations of audio recordings. They illustrate diverse aspects of a common framework for music content description and structuring implemented using the MPEG-7 standard. MPEG-7 descriptions can be generated either manually or automatically, and are stored in a XML database. Retrieval services are implemented in...
متن کاملpdf2table: A Method to Extract Table Information from PDF Files
Tables are a common structuring element in many documents, such as PDF files. To reuse such tables, appropriate methods need to be develop, which capture the structure and the content information. We have developed several heuristics which together recognize and decompose tables in PDF files and store the extracted data in a structured data format (XML) for easier reuse. Additionally, we implem...
متن کاملTraitements automatiques pour la migration de documents numériques vers XML
More and more companies are migrating their legacy document management systems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning techniques to learn a conversion model for...
متن کامل